Cross-validation and cross-study validation of kidney cancer with machine learning and whole exome sequences from the National Cancer Institute

نویسندگان

  • Abdulrhman Aljouie
  • Usman Roshan
  • Nihir Patel
چکیده

Accurate cancer risk prediction from genetic and environment variables is a key problem in medicine. One approach is to use somatic mutations which could potentially be used in early detection and prevention. SNP based studies are the most common ones utilizing this approach, however most studies lack a cross-study validation component across at least two independent studies. Here we explore the cross-validation and cross-study validation of predicting kidney cancer case and controls with SNPs obtained from whole exome sequences at the National Cancer Institute. From the Genomics Data Commons portal we obtained aligned whole exome sequences of two different kidney cancer studies: 110 cases and controls of KIRP for renal papillary cell carcinoma and 34 cases and controls of KICH for kidney chromophobe cell carcinoma. We performed a rigorous quality control procedure to obtain SNPs and rank them with feature selection. On top ranked SNPs we find the support vector machine to obtain a cross-validation accuracy of 0.71 (with 10 SNPs) and 0.72 (with 20 SNPs) in KIRP and KICH respectively. We then learn a model on KIRP and with 10 SNPs achieve an accuracy of 0.66 on the KICH samples. Our work shows that we can predict kidney chromophobe carcinoma from a kidney papillary carcinoma dataset with better than a random classification which would have 0.5 accuracy. In continuing work we are expanding these sample sizes and extending cross-study to other kidney cancer datasets in the NCI GDC portal.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-validation and cross-study validation of chronic lymphocytic leukaemia with exome sequences and machine learning

The era of genomics brings the potential of better DNA-based risk prediction and treatment. We explore this problem for chronic lymphocytic leukaemia that is one of the largest whole exome data set available from the NIH dbGaP database. We perform a standard next-generation sequence procedure to obtain Single-Nucleotide Polymorphism (SNP) variants and obtain a peak mean accuracy of 82% in our c...

متن کامل

QSAR Study of 17β-HSD3 Inhibitors by Genetic Algorithm-Support Vector Machine as a Target Receptor for the Treatment of Prostate Cancer

The 17β-HSD3 enzyme plays a key role in treatment of prostate cancer and small inhibitorscan be used to efficiently target it. In the present study, the multiple linear regression (MLR),and support vector machine (SVM) methods were used to interpret the chemical structuralfunctionality against the inhibition activity of some 17β-HSD3inhibitors. Chemical structuralinformation were described thro...

متن کامل

Prostate cancer radiomics: A study on IMRT response prediction based on MR image features and machine learning approaches

Introduction: To develop different radiomic models based on radiomic features and machine learning methods to predict early intensity modulated radiation therapy (IMRT) response.   Materials and Methods: Thirty prostate patients were included. All patients underwent pre ad post-IMRT T2 weighted and apparent diffusing coefficient (ADC) magnetic resonance imagi...

متن کامل

Oncofuse: a computational framework for the prediction of the oncogenic potential of gene fusions

MOTIVATION Gene fusions resulting from chromosomal aberrations are an important cause of cancer. The complexity of genomic changes in certain cancer types has hampered the identification of gene fusions by molecular cytogenetic methods, especially in carcinomas. This is changing with the advent of next-generation sequencing, which is detecting a substantial number of new fusion transcripts in i...

متن کامل

QSAR Study of 17β-HSD3 Inhibitors by Genetic Algorithm-Support Vector Machine as a Target Receptor for the Treatment of Prostate Cancer

The 17β-HSD3 enzyme plays a key role in treatment of prostate cancer and small inhibitorscan be used to efficiently target it. In the present study, the multiple linear regression (MLR),and support vector machine (SVM) methods were used to interpret the chemical structuralfunctionality against the inhibition activity of some 17β-HSD3inhibitors. Chemical structuralinformation were described thro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017